Supervised Machine Learning for Summarizing Legal Documents
نویسندگان
چکیده
This paper presents a supervised machine learning approach for summarizing legal documents. A commercial system for the analysis and summarization of legal documents provided us with a corpus of almost 4,000 text and extract pairs for our machine learning experiments. That corpus was pre-processed to identify the selected source sentences in extracts from which we generated legal structured data. We finally describe our sentence classification experiments relying on a Naive Bayes classifier using a set of surface, emphasis, and content features.
منابع مشابه
Machine Learning Approaches for Catchphrase Extraction in Legal Documents
The purpose of this research was to automatically extract catchphrases given a set of Legal documents. For this task, our focus was mainly on the Machine learning approaches: a comparative approach was used between the unsupervised and supervised approaches. The idea was to compare the different approaches to see which one of the two was comparatively better for automatic catchphrase extraction...
متن کاملDigital Learning for Summarizing Arabic Documents
We present in this paper an automatic summarization method of Arabic documents. This method is based on a numerical approach which uses a semi-supervised learning technique. The proposed method consists of two phases. The first one is the learning phase and the second is the use phase. The learning phase is based on the Support Vector Machine (SVM) algorithm. In order to evaluate our method, we...
متن کاملA Machine Learning Approach to Identifying Sections in Legal Briefs
With an abundance of legal documents now available in electronic format, legal scholars and practitioners are in need of systems able to search and quantify semantic details of these documents. A key challenge facing designers of such systems, however, is that the majority of these documents are natural language streams lacking formal structure or other explicit semantic information. In this re...
متن کاملUsing Non-Lexical Features to Identify Effective Indexing Terms for Biomedical Illustrations
Automatic image annotation is an attractive approach for enabling convenient access to images found in a variety of documents. Since image captions and relevant discussions found in the text can be useful for summarizing the content of images, it is also possible that this text can be used to generate salient indexing terms. Unfortunately, this problem is generally domainspecific because indexi...
متن کاملNamed Entity Recognition and Resolution in Legal Text
Named entities in text are persons, places, companies, etc. that are explicitly mentioned in text using proper nouns. The process of finding named entities in a text and classifying them to a semantic type, is called named entity recognition. Resolution of named entities is the process of linking a mention of a name in text to a pre-existing database entry. This grounds the mention in something...
متن کامل